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ABSTRACT 


Social networking sites' explosive expansion as a means of 
information sharing, management, communication, storage, and 
management has attracted hackers who abuse the Web to take 
advantage of security flaws for their own nefarious ends. Every day, 
forged internet accounts are compromised. Online social networks 
(OSNs) are rife with impersonators, phishers, scammers, and 
spammers who are difficult to spot. Users who send unsolicited 
communications to a large audience with the objective of advertising 
a product, entice victims to click on harmful links, or infect users' 
systems only for financial gain are known as spammers. Many 
studies have been conducted to identify spam profiles in OSNs. In 
this essay, we have discussed the methods currently in use to identify 
spam Twitter users. User-based, content-based, or a combination of 
both features could be used to identify spammers. The current paper 
gives a summary of the traits, methodologies, detection rates, and 
restrictions (if any) for identifying spam profiles, primarily on 
Twitter. 
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spammers 


INTRODUCTION 

A social networking site, according to Boyd et al. 
[5,] enables users to (a) create a profile, (b) 
befriend a list of other users, and (c) examine and 
navigate their own and other users’ buddy lists. 
Through the use of Web 2.0 technology, these 
online social networks (OSNs) enable user 
interaction. These social networking sites are 
expanding quickly and altering how individuals 
communicate with one another. These websites 
have transformed in less than 8 years from a 
specialised area of online activity to a phenomenon 
that attracts millions of internet users. Online 
communities bring people with similar interests 
together, making it simpler for them to stay in 
touch with one another. Sixdegrees.com was the 
first social networking site to launch in 1997, and 
makeoutclub.com followed in 2000. 
Sixdegrees.com and while new websites like 
MySpace, LinkedIn, Bebo, Orkut, Twitter, etc. 
found success. Facebook, a very well-known 
website, was introduced in 2004 [5] and rapidly 
rose to fame throughout the globe. OSNs' greater 
user numbers make them more appealing targets 
for spammers and malevolent users. On social 
media websites, spam can take many forms and is 
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difficult to identify. Anyone who has used the 
Internet has encountered spam of some kind, 
whether it is in emails, forums, newsgroups, etc. 
Spam [18] is defined as the practise of sending 
unsolicited. bulk messages over electronic 
messaging systems. OSNs have grown in 
popularity and are now used as a platform for spam 
distribution. Spammers want to send product ads to 
users who are not connected to them. Some 
spammers post URLs that similar website had a 
short lifespan and quickely faded, lead to phishing 
websites where users’ sensitive information is 
stolen. The detection of spam profiles in OSNs has 
been the subject of numerous papers. However, no 
review paper that consolidates the available 
research has yet been published in this sector. The 
purpose of our paper is to examine the academic 
research and work that have been done in this area 
by various scholars and to highlight the potential 
directions for future research. The methods for 
identifying spammers on Twitter have been 
researched and compared in this study, along with 
their presentation. The format of this essay is as 
follows: The approach used to conduct this review 
is described in Section 2, which is followed by a 
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briefing on security issues in OSNs in Section 3. 
Spammers are defined in Section 4 along with their 
motivations; the introduction to Twitter and its 
risks is given in Section 5; the purpose of this 
survey study is covered in Section 6; the properties 
that can be used for detection purposes are covered 
in Section 7; A comparative examination of the 
research produced by various researchers is 
reviewed in Section 8; new researchers are given 
research recommendations in Section 9; and the 
review is concluded in Section 10. 


METHODOLOGY 

After conducting a systematic review using a 
principled approach and searching major research 
databases for computer science like IEEE Xplore, 
ACM Digital Library, Springer Link, Google 
Scholar, and Science Direct for relevant topics, the 
current methods for detecting spam profiles in 
OSNs were surveyed. We concentrated exclusively 
on studies published after 2009 since social 
networks were not conceptualised until 1997 [1], 
and only afterwards did they gain widespread 
acceptance. Then, in 2004 [1], Facebook was 
introduced, and it quickly gained popularity. As a 
result, it took some time for people to become 
accustomed to using these networks for 
communication, which is why these networks have 
been attacked. Over 60 papers were found after 
searching the five major databases mentioned 
above. After reviewing all of the paper titles and 
abstracts, the papers that will be reviewed for this 
survey were chosen. Only papers that were deemed 
appropriate for the current investigation were 
selected. 21 papers in total have been chosen for 
evaluation after publications with titles and 
abstracts relating to spam message detection and 
other unrelated areas were eliminated. The majority 
of the criteria used to identify spammers have been 
used to categorise the papers. 


Through this essay, we're attempting to assemble a 
list of social networking papers we've read about 
identifying spam accounts on Twitter. The list is 
probably not comprehensive, but it lends shape to 
the ongoing study on identifying social network 
spammers. After reading this survey study, new 
researchers will find it simple to assess what 
research has been done, when, and how the current 
work may be expanded to improve spam detection. 
Every time it was appropriate, we included details 
on the methodology used, the dataset used, the 
features for spammer detection, and the efficacy of 
the strategies employed by different writers. 


The papers discuss, in particular, the ramifications 
of spammers' interactions with members of social 


networks as well as current methods for identifying 
them. 


SECURITY ISSUES IN OSNs 

Online social networking sites (OSNs) are 

susceptible to security and privacy problems due to 

the volume of user data that these sites process 

daily. Social networking site users are vulnerable 

to a range of attacks: 

1. Viruses - spammers utilise social networks as a 
distribution channel [19] for dangerous files to 
infect users' systems. 


2. Phishing attacks: By pretending to be a reliable 
third party, users’ sensitive information is 
obtained [30]. 


3. Users of social networks are bombarded with 
spam messages by spammers [11]. 


4. Sybil (fake) attack - attacker creates a number 
of false identities and poses as a real user in the 
system to undermine the reputation of 
trustworthy users in the network [20]. 


5. Social bots, a group of fictitious personas made 
to capture user information [32]. 


6. Attacks involving cloning and identity theft, in 
which perpetrators construct a profile of an 
already-existing user on the same network or 
across many networks in an effort to deceive 
the cloned user's friends [23]. Attackers will 
gain access to victims’ information if they allow 
the friend requests provided by these cloned 
identities. Users and systems are overextended 
by these attacks. 


TYPES OF SPAMMERS 

The fraudulent users known as spammers put 

social networks’ security and users' privacy at risk 

by tainting the data shared by legal users. One of 

the following categories best describes spammers 

[22]: 

1. Phishers are people who act normally but are 
actually out to steal the personal information of 
other real users. 


2. Fake Users: These are users who spoof real 
users' profiles in order to distribute spam to 
their friends or other network users. 


3. Promoters: These are people that spread 
harmful links in advertisements or other 
promotional materials to other people in an 
effort to collect their personal data. 


Spammers' motivations: 

Promote pornography, spread malware, launch 
phishing attacks, and harm the reputation of the 
system. 
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TWITTER AS AN OSN 

Introduction 

Twitter is a social networking website with 500 
million active users [14] as of today who share 
information. It was first introduced on March 21, 
2006 [14]. Twitter's logo is a chirping bird, hence 
the name of the website. Users can access it to 
exchange frequent information called “tweets” 
which are messages of up to 140 characters long 
that anyone can send or read. These tweets are 
public by default and visible to all those who are 
following the tweeter. Users share these tweets 
which may contain news, opinions, photos, videos, 
links, and messages. Following is the standard 
terminology used in Twitter and relevant to our 
work: 


Tweets [3]: A Twitter message that is no longer 
than 140 characters. 


Followers and Followings [3]: Followers are users 
who a specific user is following, while Followings 
are people that a user is following. 


Retweet [3]: A tweet that has been forwarded to a 
user's entire following. 


Hashtags [3]: The # sign is used to annotate 
keywords or subjects in a tweet so that search 
engines may quickly find them. 


Mention [3]: You can include replies and mentions 
of other users in tweets by using the @ sign in 
front of their usernames. 


Lists [3]: Twitter offers a tool for grouping the 
persons you follow into lists. 


Direct Message [3]: Also known as a DM, this 
refers to Twitter's mechanism for direct messaging 
users to communicate privately. 


According to Twitter policy [16], signs of spam 
profiles include metrics like following a lot of 
users quickly, 1 posting mostly links, using popular 
hashtags (#) when posting unrelated information, 
and repeatedly posting other users' tweets as your 
own. By tweeting to @spam, users have the option 
to report spammy profiles to Twitter. However, the 
Twitter policy [16] does not make it clear whether 
managers utilise user reports or automated 
processes to look for these circumstances, despite 
the fact that it is assumed that both approaches are 
used. 


Threats on Twitter 

1. Spammed Tweets [13]: Twitter only allows 
users to post tweets with a maximum of 140 
characters, but despite this restriction, 
cybercriminals have discovered a way to make 


the most of it by creating succinct but 
compelling tweets that include links to 
promotions for free vouchers, job postings, or 
other promotions. 


2. Downloads of malware [13]: Cybercriminals 
have shared tweets with links to websites where 
malware can be downloaded using Twitter. The 
Twitter worms that transmitted direct messages 
and even malware that attacked both Windows 
and Mac operating systems include FAKEAV 
and backdoor[13] programmes. KOOBFACE 
[13], a piece of social media virus that attacked 
both Facebook and Twitter, has the worst 
reputation. 


3. Twitter bots [13]: Online criminals frequently 
utilise Twitter to run and manage botnets. 
These botnets threaten the security and privacy 
of the users by controlling their accounts. 


Social Implications of OSNs 

In addition to the typical issues that social 
networking sites bring for users, such as spamming, 
phishing assaults, malware infestations, social bots, 
viruses, etc., the biggest challenge is maintaining 
the security and confidentiality of private data. 


Twitter policy states that if an account has more 
than 2,000 followers, this amount is constrained by 
the number of followers the account has. 


Social networking websites are created with the 
intention of making information readily available 
and accessible to others. But tragically, 
cybercriminals exploit this information, which is 
readily accessible, to launch focused assaults. 
Attackers can easily find a means to gain access to 
a user's account in order to gather more information 
and use that information to gain access to the user's 
other accounts and the accounts of their friends. 


MOTIVATING REVIEW 

Social networks have been a target for spammers 
due to the simplicity of information sharing and the 
ability to stay up to date with current subjects. It 
can be challenging to identify such fraudulent 
individuals in OSNs because spammers are well- 
aware of the methods available to identify them. 
For the purpose of collecting money, spammers 
can utilise OSNs as the ideal platform to pose as 
legitimate users and attempt to convince innocent 
users to click on harmful posts. The most crucial 
area being researched by numerous experts is how 
to identify such people in order to safeguard the 
network and protect users’ private information. In 
order to quickly evaluate the work that has been 
done in this field, researchers will find this paper to 
be of great assistance. 
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FEATURES DISTINGUISHING SPAMMERS 
& NON-SPAMMERS IN TWITTER 

The papers analysed in this study are shown in 
Table 1, along with the type of features that were 
utilised to identify spam Twitter profiles. Spam and 


user-based or content-based characteristics. In any 
social network, user-based features are the 
characteristics of the user's profile and behaviour, 
whereas content-based features are the 
characteristics of the text that users publish. 


non-spam profiles can be distinguished by either 


Table 1 Features for the detection of spam profiles 
Attributes used for detection of spam profiles 
User based features: 
which contain demographic information such as profile information, follower and 
following numbers, followers-to-followers ratio, reputation, account age, average time 
between tweets, posting habits, idle hours, tweet frequency, etc.[33,12,34,3,26] 
Content based features: 
among them are the quantity of hashtags (#), the quantity of URLs in tweets, @ mentions, 
retweets, spam terms, HTTP links, trending topics, duplicate tweets, etc.[33,7,11,25] 
User based and content based both [1,22,24,27,29,2,4] 
Any additional features, such as graph connectedness or pictorial distance: Graph-based 
features, neighbor-based features, interaction-based features, social links, social activities, 
and Markov clustering method [21,9,28,33,23,6] 


Function of the aforementioned features in identifying spam profiles in accordance with Twitter rules [16]: 
1. The quantity of followers—spammers have fewer followers. 


2. The amount of followers—Spammers frequently follow a lot of users. 
3. Followers/Following Ratio: Spammers have a ratio of less than 1. 
4 


. The ratio of followers to the total of followers and followings is referred to as reputation. Spammers are 
well-known. 


5. Account age is calculated using the current date and the account's inception date. Since spammers 
typically create fresh accounts, this feature is less useful to them. 


6. Average time between posts - In order to attract attention, spammers send out more tweets quickly. 


7. Posting Time Behaviour: Spammers frequently post at predetermined times, such as early in the 
morning or late at night when real users aren't using social media. 


8. Idle hours: Spammers continue to send messages to cut down on their idle time. 


9. Tweet frequency: To attract other users' attention, spammers tweet more frequently and at unusual 
hours. 


10. The quantity of hashtags (#) used by spammers to entice genuine users to read their tweets by posting 
numerous unrelated updates to the most popular topics on Twitter. 

11. URLs: Spammers frequently tweet a big number of URLs to dangerous websites. 

12. @mentions: In order to avoid being found, spammers use as many @usernames of unknown 
individuals as possible in their tweets. 


13. Retweets are replies to any tweet that contain the @RT symbol, and spammers frequently utilise @RT 
in their tweets.less free time. 


14. Spam Words — The majority of spammers' tweets contain spam words. 

15. HTTP links - Tweets made by spammers contain the most www or http:// characters. 

16. Duplicate tweets: Spammers frequently use many @usernames in their tweets to post identical tweets. 
EXISTING METHODS FOR DETECTION OF SPAM PROFILES IN TWITTER 

Researchers have employed a variety of strategies to identify the spam profiles in distinct OSNs. As 
Twitter is used to discuss and disseminate information about trending topics in real time rather than just as 
a social communication platform, we are concentrating primarily on the work that has been done to identify 


spammers on Twitter. The summary of the papers that were looked at about the identification of spammers 
on Twitter is shown in Table 2. 
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In 2010, Alex Hai Wang [1] made significant progress in the area of spam profile detection using both 
user- and content-based features. To find suspicious Twitter users, a prototype spam detection system has 
been presented. To investigate the "follower" and "friend" relationships, a directed social graph model has 
been put forth. Using a Bayesian classification technique, content-based characteristics and user-based 
features have been employed to make spam detection easier in accordance with Twitter's spam policy. The 
performance of numerous traditional classification techniques, including Decision Trees, Support Vector 
Machines (SVM), Naive Bayesian, and Neural Networks, has been compared using standard evaluation 
measures, and among all of them, the Bayesian classifier has been found to perform the best. The algorithm 
attained a 93.5% accuracy and an 89% precision across the 2,000 users in the crawling dataset and the 500 
users in the test dataset. This method's limitation is that it was only evaluated on a very small dataset of 500 
individuals by taking into account their 20 most recent tweets. 


When Lee et al. [22] installed social honeypots made up of real profiles, they were able to identify 
suspicious users, and their bot gathered proof of spam by crawling the profile of the user who sent the 
unsolicited friend requests and URLs on Twitter and MySpace. Spammers have been identified using 
characteristics of profiles such as their posting habits, content, and friend information to build machine 
learning classifiers. Following investigation, profiles of users who contacted these social honeypots on 
Twitter and MySpace via unsolicited friend requests have been gathered. Spammers have been identified 
using the LIBSVM classifier. The approach's validation on two separate dataset combinations—10% 
spammers+90% non-spammers and 10% non-spammers+90% spammers—is one of its strong points. The 
approach has a drawback because fewer datasets have been utilised for validation. 


Based on the content of tweets and user-based attributes, Benevenuto et al. [7] identified spammers. The 
following tweet content attributes are used: the quantity of hashtags per word, the quantity of URLs per 
word, the quantity of words per tweet, the quantity of characters per tweet, the quantity of hashtags per 
tweet, the quantity of numeric characters in the text, the quantity of users mentioned in each tweet, and the 
quantity of times the tweet has been retweeted. The features that set spammers apart from non-spammers 
include the percentage of tweets that contain URLs, the percentage of tweets that contain spam words, and 
the average amount of words that are hashtags on the tweets. 54 million Twitter users have been crawled, 
and 1065 users have been manually classified as spammers and non-spammers. Spammers and non- 
spammers have been separated using supervised machine learning, or SVM classifier. The system's 
detection accuracy is 87.6%, with only 3.6% of non-spammers incorrectly categorised. 


Sending a message to "@spam" on Twitter enables users to report spam accounts to the company. Gee et al. 
[12] took use of this property and used a classification technique to find spam profiles. Both spam and 
regular user profiles have been gathered using the Twitter API and "@spam" in Twitter, respectively. The 
collected data was first represented in JSON before being provided in CSV format as a matrix. Users are 
rows in the matrix, and features are columns. Then CSV files were trained using Naive Bayes algorithm 
with 27% error rate then SVM algorithm has been used with error rate of 10%. Spam profiles detection 
accuracy is 89.3%. Limitation of this approach is that not very technical features have been used for 
detection and precision is also less 1.e. 89.3% so it has been suggested that aggressive deployment of any 
system should be done only if precision is more than 99%. 


McCord et al. [24] employed content-based features such the quantity of URLs, replies/mentions, retweets, 
and hashtags as well as user-based features like the quantity of friends and followers. Spam profiles on 
Twitter have been identified using classifiers including Random Forest, Support Vector Machine (SVM), 
Naive Bayesian, and K-Nearest Neighbour. The Random Forest classifier, which yields the best results after 
the SMO, Naive Bayesian, and K-NN classifiers, has been validated on 1000 users with 95.7% precision and 
95.7% accuracy. As a result of the unbalanced dataset used and the fact that Random Forest is typically used 
in cases of unbalanced datasets, this approach's limitation is that reputation feature has been giving incorrect 
results for the considered dataset, failing to distinguish between spammers and non-spammers. Finally, the 
approach has only been validated on a small sample size. 


Using two distinct features—URL rate and interaction rate—Lin et al. [28] identified persistent spam 
accounts in Twitter. Many different indicators, including the number of followers, number of followings, 
followers-to-following ratio, tweet content, number of hashtags, URL links, etc., have been utilised by the 
majority of publications to identify spam accounts. However, according to this study, all of these features 
are not very good at spotting spammers, hence only straightforward yet useful features like URL rate and 
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interaction rate have been employed for identification. The ratio of tweets with URLs to all tweets is known 
as the URL rate, while the ratio of tweets that interact with one another is known as the interaction rate. 
Twitter API was used to crawl 26,758 accounts, and J48 classifier analysis was performed on 816 long- 
surviving accounts with an accuracy rate of 86%. The approach's limitation is that only two variables were 
utilised to detect spam profiles; hence, if spammers maintain low URL rates and low interaction rates, the 
system will not function as planned. 


There are two different kinds of spammer detection systems, according to Amit A. et al. [2]: one is URL- 
centric, which relies on identifying fraudulent URLs, and the other is user-centric, which is based on 
features relating to people such followers/following ratio. The method used in this research is a hybrid one 
that takes into account both of the properties listed above. Along with an alert system to identify spam 
tweets, 15 new features have been proposed to catch spammers. Spammers' tweet campaigns and methods 
have also been researched. A dataset from Twitter with 500K users and another with 110,789 individuals 
were both used. Bait-oriented features, which highlight the strategies used by spammers to get victims to 
click on harmful links, include mentions to non-followers, trend hijacking, and trend intersection with well- 
known trends. Tweet interval variation, tweet volume variation, ratio of tweet interval variation to tweet 
volume variation, and tweeting sources are examples of behavioural characteristics. Duplicate URLs, 
duplicate domain names, and IP/domain ratio are examples of URL characteristics. Dissimilarity of tweet 
content, similarity of tweets, and URL and tweet similarity are all examples of content entropy properties. 
Follower/following ratio and the profile's description language dissimilarity are aspects of the profile. Then, 
using the Weka tool, all of these features were gathered from both malicious and benign users and fed into 
four supervised learning algorithms: Decision Tree, Random Forest, Bayes Network, and Decorate. With 
Decorate's classifier, which produces the best results, 93.6% of spammers have been found. It has been 
demonstrated that this method performs better than Twitter's spammer detection strategy. However, this 
method has only been tested on 31,808 individuals, whereas Twitter is taking into account millions of users. 


A technique to identify abusive us ers that publish offensive content, including dangerous URLs, 
pornographic URLs, and phishing links, drive regular users from social networks, and violate their privacy 
has been presented by Chakraborty et al. [4]. The algorithm has two steps: the first checks a user's profile for 
offensive content before sending a friend request to another user, and the second checks the similarity of two 
profiles. If the user should accept a friend invitation after these two phases is up to the recommendation. 
This has been tested with a 5000 user Twitter dataset that was gathered using the REST API. Timing, 
content, and profile-based criteria are all taken into account when determining how to distinguish between 
abusive and non-abusive users. There have been SVM, Decision Tree, Random Forest, and Nave Bayesian 
classifiers employed. All classifiers are outperformed by SVM, and the model is operating at an accuracy of 
89%. 


New features were used by Yang et al. [6] to identify spammers on Twitter. There have been discussions of 
a number of evasion strategies used by spammers. Ten new detection features have been proposed, including 
three graph-based features, three neighbor-based features, three automation-based features, and one timing- 
based feature. These features are expensive and difficult to get around because they are based on techniques 
that spammers don't use to avoid detection and require more time, money, and resources. With the help of 
classifiers like Random Forest, Decision Tree, Decorate, and Bayesian Network, 18 features—eight already 
existent and ten new—have been examined for detection purposes. A Bayesian classifier's accuracy of 
88.6% is the best. This method has a limitation in that very little data has been crawled and only a specific 
sort of spammers is being found with a low detection rate, which is the minimum number of spammers 
found in the dataset. 


RESEARCH DIRECTIONS 

During the survey, it became pretty clear that there has been a lot of work done to identify spam profiles in 

various OSNs. Even so, the detection rate can be improved by switching up the method and using more 

substantial features as the determining factor. The following are a few findings from the survey: 

1. Considering that Twitter has millions of active users, and this number is growing. Additionally, almost 
all writers used a relatively limited testing dataset to evaluate the effectiveness of their methodology. 
Therefore, in order to evaluate the effectiveness of any strategy, the testing dataset must be expanded. 

2. A multivariate model must be developed. 

A technique that can identify all types of spammers must be developed. 

4. Itis necessary to test the methods using various mixtures of spammers and non-spammers. 


as 
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Table 2 Outline of techniques used for thedetection of spammers 


Author | MetricsUsed | Methodology Used DatasetUsed | Results 
: Graph Based POD ae NAIVE Validatedon 500 | Naive Bayesian giving 
Alex Hai Bayesian, Neural : f ; 
and Content Twitter users with highestaccuracy - 
Wang[1] Network, SVM and 
based e 20 recent tweets 93.5% 
Decision Tree 
Compared Decorate, 
SimpleLogistic, FT, TBE apace an 
Lee et. LogiBoost, Validatedon 1000 : nae, 
User based : highestaccuracy- 
al.[22] RandomSubS pace, Twitter users 
; 88.98% 
Bagging, J48, 
LibSVM 
Accuracy- 
87.6% (with user 
Benceen User Validated based and content 
7 ri neier Based and SVM on 1065 Twitter based features) and 
= Contentbased users accuracy- 84.5% (with 
only user based 
features) 
Validated on 450 
Gee et. Compared Naive Twitter users with 
al.[12] Deer based Bayesian, SVM 200 recent reMnae eee 
tweets 
Compared Random Validated on 1000 Radon Raeecsien 
McCord et. | User based and Forest, SVM, Naive Twitterusers with Hiokeel sane ee 8 
al.(24] content based Bayesian, K- 100 recent g y 
95.7% 
NN tweets 
Lin et. URL rate, Validated on as 
al.[28] interaction rate i 400 Twitterusers pees 
Compared Random 
Amit A. et. Introduced 15 Forest, Decision Tree, Validated 0n31,808 Accuracy-93.6% 
al.[2] new features Decorate, Naive Twitter users ees 
Bayesian 
Compared Random Trained on 5000 
Chakrabor User based, Forest, SVM, Naive Twitterusers with SVM giving highest 
ty et. al.[4] | Contentbased Bayesian, 200 recent accuracy-89% 
Decision Tree tweets 
18 Validated ontwo 
features (8- Compared Random datasets- 5000 users Bavesiia en 
Yang et. existing & 10 Forest, Decision Tree, and then 3500 ~ SIVINE 
: highest accuracy- 
al.[6] new features Decorate, Naive users 88 6% 
introduce Bayesian with 40 recent , 
d) tweets 
CONCLUSION only tried with a single combination of spammers 


Researchers have created and employed a variety 
of techniques to identify spammers on various 
social networks. The majority of the work has been 
done utilising classification approaches like SVM, 
Decision Tree, Naive Bayesian, and Random 
Forest, as can be inferred from the publications 
examined. User-based features, content-based 
features, or a combination of both have been used 
for detection. A few authors additionally added 
new detection features. All of the methods were 


and non-spammers and were validated on a very 
limited dataset. In comparison to employing solely 
user-based or content-based characteristics, 
combining features for the detection of spammers 
has demonstrated improved performance in terms 
of accuracy, precision, recall, etc. 
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